智能论文笔记

On the Importance of Critical Period in Multi-stage Reinforcement Learning

Junseok Park , Inwoo Hwang , Min Whoo Lee , Hyunseok Oh , Minsu Lee , Youngki Lee , Byoung-Tak Zhang

分类：人工智能 | 机器学习 | 神经与进化计算

2022-08-09

婴儿生命的最初几年被称为关键时期，在此期间，由于神经可塑性，学习绩效的总体发展受到显着影响。在最近的研究中，具有深层神经网络模仿实际神经元的深层神经网络的AI药物表现出与人类关键时期类似的学习期。特别是在此初期，适当的刺激在发展学习能力中起着至关重要的作用。但是，将人类的认知偏见转变为适当的塑造奖励是非常具有挑战性的，并且在关键时期的先前工作并不集中于寻找适当的刺激。为了进一步迈出一步，我们建议多阶段的增强学习强调在关键时期发现``适当的刺激''。受到人类早期认知发展阶段的启发，我们在关键时期附近使用多阶段的指导，并证明就AI代理的性能，效率和稳定性而言，适当的成型奖励（2阶段指导）。

translated by 谷歌翻译

Toddler-Guidance Learning: Impacts of Critical Period on Multimodal AI Agents

Junseok Park , Kwanyoung Park , Hyunseok Oh , Ganghun Lee , Minsu Lee , Youngki Lee , Byoung-Tak Zhang

分类：机器学习 | 人工智能 | 计算机视觉

2022-01-12

关键时期是阶段，其中幼儿的大脑在喷射中发展。为促进儿童认知发展，在本阶段至关重要。然而，目前尚不清楚是否存在对AI代理商的培训也存在这种关键时期。与人类幼儿相似，顺序引导和多模式相互作用可能显着提高AI代理的培训效率。为了验证这一假设，我们将此概念调整到AI代理商中学习的关键时期，并调查AI代理人的虚拟环境中的关键时期。我们在加固学习（RL）框架中正规化关键时期和幼儿指导学习。然后，我们建立了一个像veca工具包的幼儿环境，以模仿人类托儿的学习特征。我们研究三个离散的相互互动水平：弱导兵指导（稀疏奖励），中等导师指导（助手奖励）和导师演示（行为克隆）。我们还介绍了由30,000个现实世界图像组成的EAVE数据集，以完全反映幼儿的观点。我们从两个角度评估关键时期对AI代理商的影响：如何以及何时在统一和多式化学习中最佳。我们的实验结果表明，Uni-和多式联运剂，具有中等导师的指导和100万和200万次训练步骤的关键期显示出明显的改进。我们通过在EAVE数据集上传输学习来验证这些结果，并在同一关键时期和指导下找到性能进步。

translated by 谷歌翻译

MAUVE Scores for Generative Models: Theory and Practice

Krishna Pillutla , Lang Liu , John Thickstun , Sean Welleck , Swabha Swayamdipta , Rowan Zellers , Sewoong Oh , Yejin Choi , Zaid Harchaoui

分类：机器学习 | 人工智能 | 自然语言处理

2022-12-30

Generative AI has matured to a point where large-scale models can generate text that seems indistinguishable from human-written text and remarkably photorealistic images. Automatically measuring how close the distribution of generated data is to the target real data distribution is a key step in diagnosing existing models and developing better models. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore four approaches to statistically estimate these scores: vector quantization, non-parametric estimation, classifier-based estimation, and parametric Gaussian approximations. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of $f$-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We conclude the paper by demonstrating its applications to other AI domains and discussing practical recommendations.

translated by 谷歌翻译

Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

Taehyun Hwang , Min-hwan Oh

分类： (统计)机器学习 | 机器学习

2022-12-27

We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown transition core with features of state and action. Despite much recent progress in analyzing algorithms in the linear MDP setting, the understanding of more general transition models is very restrictive. In this paper, we establish a provably efficient RL algorithm for the MDP whose state transition is given by a multinomial logistic model. To balance the exploration-exploitation trade-off, we propose an upper confidence bound-based algorithm. We show that our proposed algorithm achieves $\tilde{\mathcal{O}}(d \sqrt{H^3 T})$ regret bound where $d$ is the dimension of the transition core, $H$ is the horizon, and $T$ is the total number of steps. To the best of our knowledge, this is the first model-based RL algorithm with multinomial logistic function approximation with provable guarantees. We also comprehensively evaluate our proposed algorithm numerically and show that it consistently outperforms the existing methods, hence achieving both provable efficiency and practical superior performance.

translated by 谷歌翻译

Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?

Byung-Doh Oh , William Schuler

分类：自然语言处理

2022-12-23

This work presents a detailed linguistic analysis into why larger Transformer-based pre-trained language models with more parameters and lower perplexity nonetheless yield surprisal estimates that are less predictive of human reading times. First, regression analyses show a strictly monotonic, positive log-linear relationship between perplexity and fit to reading times for the more recently released five GPT-Neo variants and eight OPT variants on two separate datasets, replicating earlier results limited to just GPT-2 (Oh et al., 2022). Subsequently, analysis of residual errors reveals a systematic deviation of the larger variants, such as underpredicting reading times of named entities and making compensatory overpredictions for reading times of function words such as modals and conjunctions. These results suggest that the propensity of larger Transformer-based models to 'memorize' sequences during training makes their surprisal estimates diverge from humanlike expectations, which warrants caution in using pre-trained language models to study human language processing.

translated by 谷歌翻译

Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied Navigation

Gyan Tatiya , Jonathan Francis , Luca Bondi , Ingrid Navarro , Eric Nyberg , Jivko Sinapov , Jean Oh

分类：机器人 | 人工智能 | 计算机视觉

2022-12-21

Generalisation to unseen contexts remains a challenge for embodied navigation agents. In the context of semantic audio-visual navigation (SAVi) tasks, the notion of generalisation should include both generalising to unseen indoor visual scenes as well as generalising to unheard sounding objects. However, previous SAVi task definitions do not include evaluation conditions on truly novel sounding objects, resorting instead to evaluating agents on unheard sound clips of known objects; meanwhile, previous SAVi methods do not include explicit mechanisms for incorporating domain knowledge about object and region semantics. These weaknesses limit the development and assessment of models' abilities to generalise their learned experience. In this work, we introduce the use of knowledge-driven scene priors in the semantic audio-visual embodied navigation task: we combine semantic information from our novel knowledge graph that encodes object-region relations, spatial knowledge from dual Graph Encoder Networks, and background knowledge from a series of pre-training tasks -- all within a reinforcement learning framework for audio-visual navigation. We also define a new audio-visual navigation sub-task, where agents are evaluated on novel sounding objects, as opposed to unheard clips of known objects. We show improvements over strong baselines in generalisation to unseen regions and novel sounding objects, within the Habitat-Matterport3D simulation environment, under the SoundSpaces task.

translated by 谷歌翻译

Entropy- and Distance-Based Predictors From GPT-2 Attention Patterns Predict Reading Times Over and Above GPT-2 Surprisal

Byung-Doh Oh , William Schuler

分类：自然语言处理

2022-12-21

Transformer-based large language models are trained to make predictions about the next word by aggregating representations of previous tokens through their self-attention mechanism. In the field of cognitive modeling, such attention patterns have recently been interpreted as embodying the process of cue-based retrieval, in which attention over multiple targets is taken to generate interference and latency during retrieval. Under this framework, this work first defines an entropy-based predictor that quantifies the diffuseness of self-attention, as well as distance-based predictors that capture the incremental change in attention patterns across timesteps. Moreover, following recent studies that question the informativeness of attention weights, we also experiment with alternative methods for incorporating vector norms into attention weights. Regression experiments using predictors calculated from the GPT-2 language model show that these predictors deliver a substantially better fit to held-out self-paced reading and eye-tracking data over a rigorous baseline including GPT-2 surprisal. Additionally, the distance-based predictors generally demonstrated higher predictive power, with effect sizes of up to 6.59 ms per standard deviation on self-paced reading times (compared to 2.82 ms for surprisal) and 1.05 ms per standard deviation on eye-gaze durations (compared to 3.81 ms for surprisal).

translated by 谷歌翻译

Can Current Task-oriented Dialogue Models Automate Real-world Scenarios in the Wild?

Sang-Woo Lee , Sungdong Kim , Donghyeon Ko , Donghoon Ham , Youngki Hong , Shin Ah Oh , Hyunhoon Jung , Wangkyo Jung , Kyunghyun Cho , Donghyun Kwak

分类：自然语言处理

2022-12-20

Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-world scenarios and that the current TOD models are still a long way from unraveling the scenarios. In this position paper, we first identify current status and limitations of SF-TOD systems. After that, we explore the WebTOD framework, the alternative direction for building a scalable TOD system when a web/mobile interface is available. In WebTOD, the dialogue system learns how to understand the web/mobile interface that the human agent interacts with, powered by a large-scale language model.

translated by 谷歌翻译

Evaluating Multimodal Interaction of Robots Assisting Older Adults

Afagh Mehri Shervedani , Ki-Hwan Oh , Bahareh Abbasi , Natawut Monaikul , Zhanibek Rysbek , Barbara Di Eugenio , Milos Zefran

分类：机器人

2022-12-20

We outline our work on evaluating robots that assist older adults by engaging with them through multiple modalities that include physical interaction. Our thesis is that to increase the effectiveness of assistive robots: 1) robots need to understand and effect multimodal actions, 2) robots should not only react to the human, they need to take the initiative and lead the task when it is necessary. We start by briefly introducing our proposed framework for multimodal interaction and then describe two different experiments with the actual robots. In the first experiment, a Baxter robot helps a human find and locate an object using the Multimodal Interaction Manager (MIM) framework. In the second experiment, a NAO robot is used in the same task, however, the roles of the robot and the human are reversed. We discuss the evaluation methods that were used in these experiments, including different metrics employed to characterize the performance of the robot in each case. We conclude by providing our perspective on the challenges and opportunities for the evaluation of assistive robots for older adults in realistic settings.

translated by 谷歌翻译

Tracking by Associating Clips

Sanghyun Woo , Kwanyong Park , Seoung Wug Oh , In So Kweon , Joon-Young Lee

分类：计算机视觉

2022-12-20

The tracking-by-detection paradigm today has become the dominant method for multi-object tracking and works by detecting objects in each frame and then performing data association across frames. However, its sequential frame-wise matching property fundamentally suffers from the intermediate interruptions in a video, such as object occlusions, fast camera movements, and abrupt light changes. Moreover, it typically overlooks temporal information beyond the two frames for matching. In this paper, we investigate an alternative by treating object association as clip-wise matching. Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips. The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames, and the short clip tracking avoids the conventional error-prone long-term track memory management. Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching. Given the state-of-the-art tracking-by-detection tracker, QDTrack, we showcase how the tracking performance improves with our new tracking formulation. We evaluate our proposals on two tracking benchmarks, TAO and MOT17 that have complementary characteristics and challenges each other.

translated by 谷歌翻译